Yet Another Ranking Function for Automatic Multiword Term Extraction

نویسندگان

  • Juan Antonio Lossio Ventura
  • Clement Jonquet
  • Mathieu Roche
  • Maguelonne Teisseire
چکیده

Term extraction is an essential task in domain knowledge acquisition. We propose two new measures to extract multiword terms from a domain-specific text. The first measure is both linguistic and statistical based. The second measure is graph-based, allowing assessment of the importance of a multiword term of a domain. Existing measures often solve some problems related (but not completely) to term extraction, e.g., noise, silence, low frequency, large-corpora, complexity of the multiword term extraction process. Instead, we focus on managing the entire set of problems, e.g., detecting rare terms and overcoming the low frequency issue. We show that the two proposed measures outperform precision results previously reported for automatic multiword extraction by comparing them with the state-of-the-art reference measures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Inference of Base Forms for Multiword Terms in Lithuanian

This paper reports on a specific problem of automatic terminology extraction in Lithuanian – base form inference. While the process of lemmatisation is properly carried out by existing tools, problems arise with normalizing multiword terms. It can be described as the discrepancy between the base form (i. e. lemma) of a term and the sequence of the base forms of constituent lexical items within ...

متن کامل

Clustering-based Approach to Multiword Expression Extraction and Ranking

We present a domain-independent clusteringbased approach for automatic extraction of multiword expressions (MWEs). The method combines statistical information from a general-purpose corpus and texts from Wikipedia articles. We incorporate association measures via dimensions of data points to cluster MWEs and then compute the ranking score for each MWE based on the closest exemplar assigned to a...

متن کامل

A method for automatic extraction of multiword units representing business aspects from user reviews

The paper describes a semi-supervised approach to extracting multiword aspects of user-written reviews that belong to a given category. The method starts with a small set of seed words representing the target category, and calculates distributional similarity between the candidate and seed words. We compare three distributional similarity measures (Lin’s, Weeds’ and balAPinc), and a document re...

متن کامل

TBXTools: A Free, Fast and Flexible Tool for Automatic Terminology Extraction

The manual identification of terminology from specialized corpora is a complex task that needs to be addressed by flexible tools, in order to facilitate the construction of multilingual terminologies which are the main resources for computer-assisted translation tools, machine translation or ontologies. The automatic terminology extraction tools developed so far either use a proprietary code or...

متن کامل

MULTILINGUAL MULTIWORD EXPRESSIONS Literature Survey

Multiword Expressions are idiosyncratic word usages of a language which often have noncompositional meaning. The knowledge of multiword expressions is necessary for many NLP tasks like, machine translation, natural language generation, named entity recognition, sentiment analysis etc. In order for other NLP applications to benefit from the knowledge of multiword expressions, they need to be ide...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014